Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks
نویسندگان
چکیده
Meta reinforcement learning (meta-RL) aims to learn a policy solving set of training tasks simultaneously and quickly adapting new tasks. It requires massive amounts data drawn from infer the common structure shared among Without heavy reward engineering, sparse rewards in long-horizon exacerbate problem sample efficiency meta-RL. Another challenge meta-RL is discrepancy difficulty level tasks, which might cause one easy task dominating thus preclude adaptation This work introduces novel objective function an action translator We theoretically verify that value transferred with can be close source our (approximately) upper bounds difference. propose combine context-based algorithms for better collection moreefficient exploration during meta-training. Our approach em-pirically improves performance ofmeta-RL on sparse-reward
منابع مشابه
Inter-Task Action Correlation for Reinforcement Learning Tasks
Introduction Reinforcement learning (RL) problems (Sutton & Barto 1998) are characterized by agents making decisions attempting to maximize total reward, which may be time delayed. RL problems contrast with classical planning problems in that agents do not know a priori how their actions will affect the world. RL differs from supervised learning because agents are never given training examples ...
متن کاملLearning by Playing - Solving Sparse Reward Tasks from Scratch
We propose Scheduled Auxiliary Control (SACX), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is...
متن کاملReward, Motivation, and Reinforcement Learning
There is substantial evidence that dopamine is involved in reward learning and appetitive conditioning. However, the major reinforcement learning-based theoretical models of classical conditioning (crudely, prediction learning) are actually based on rules designed to explain instrumental conditioning (action learning). Extensive anatomical, pharmacological, and psychological data, particularly ...
متن کاملCompatible Reward Inverse Reinforcement Learning
PROBLEM • Inverse Reinforcement Learning (IRL) problem: recover a reward function explaining a set of expert’s demonstrations. • Advantages of IRL over Behavioral Cloning (BC): – Transferability of the reward. • Issues with some IRL methods: – How to build the features for the reward function? – How to select a reward function among all the optimal ones? – What if no access to the environment? ...
متن کاملAn Average - Reward Reinforcement Learning
Recently, there has been growing interest in average-reward reinforcement learning (ARL), an undiscounted optimality framework that is applicable to many diierent control tasks. ARL seeks to compute gain-optimal control policies that maximize the expected payoo per step. However, gain-optimality has some intrinsic limitations as an optimality criterion, since for example, it cannot distinguish ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i6.20635